Many modern data-intensive computational problems either require, or benefitfrom distance or similarity data that adhere to a metric. The algorithms runfaster or have better performance guarantees. Unfortunately, in realapplications, the data are messy and values are noisy. The distances betweenthe data points are far from satisfying a metric. Indeed, there are a number ofdifferent algorithms for finding the closest set of distances to the given onesthat also satisfy a metric (sometimes with the extra condition of beingEuclidean). These algorithms can have unintended consequences, they can changea large number of the original data points, and alter many other features ofthe data. The goal of sparse metric repair is to make as few changes as possible to theoriginal data set or underlying distances so as to ensure the resultingdistances satisfy the properties of a metric. In other words, we seek tominimize the sparsity (or the $\ell_0$ "norm") of the changes we make to thedistances subject to the new distances satisfying a metric. We give threedifferent combinatorial algorithms to repair a metric sparsely. In one settingthe algorithm is guaranteed to return the sparsest solution and in the othersettings, the algorithms repair the metric. Without prior information, thealgorithms run in time proportional to the cube of the number of input datapoints and, with prior information we can reduce the running time considerably.
展开▼